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Abstract 

In a Hilbert setting, we introduce a new dynamical system and associated algorithms 
for solving monotone inclusions by rapid methods. Given a maximal monotone operator 
A, the evolution is governed by the time dependent operator J — (/ + A(t)A)“^, where the 
positive control parameter A(t) tends to infinity as t —>■ +oo. The tuning of A(-) is done in 
a closed-loop way, by resolution of the algebraic equation A||(/ + AA)“^a; — a;|| = 0, where 
0 is a positive given constant. The existence and uniqueness of a strong global solution 
for the Cauchy problem follows from Cauchy-Lipschitz theorem. We prove the weak 
convergence of the trajectories to equilibria, and superlinear convergence under an error 
bound condition. When A = df is the subdifferential of a closed convex function /, we 
show a 0(l/t^) convergence property of f{x{t)) to the infimal value of the problem. Then, 
we introduce proximal-like algorithms which can be obtained by time discretization of the 
continuous dynamic, and which share the same fast convergence properties. As distinctive 
features, we allow a relative error tolerance for the solution of the proximal subproblem 
similar to the ones proposed in [191 HD] , and a large step condition, as proposed in muD]. 
For general convex minimization problems, the complexity is 0(l/n^). In the regular case, 
we show the global quadratic convergence of an associated proximal-Newton method. 
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Introduction 


Let !K be a real Hilbert space, and A : 'K ^ "K he a maximal monotone operator. The space 
TC is endowed with the scalar product (.,.), with ||x|p = {x,x) for any x G TC. Our goal is to 
develop new continuous and discrete dynamics, with properties of fast convergence, designed 
to solve the equation 

find X G TC such that 0 G Ax. (1) 

We start from the classical method, which consists in formulating ([T]) as a fixed point problem: 

find X G TC such that x — (/ + x = 0, (2) 

where A > 0 is a positive parameter, and (I + AH)~^ is the resolvent of index A of H (recall 
that the resolvents are non expansive mappings from “K into IK). Playing on the freedom of 
choice of the parameter A > 0, we are led to consider the evolution problem: 

x{t) + x{t) — (/ + X{t)A)~^x{t) = 0. (3) 

When A(-) is locally absolutely continuous, this differential equation falls within Cauchy- 
Lipschitz theorem. Then, the strategy is to choose a control variable t i—)• A(t) which gives 
good properties of asymptotic convergence of ([3]). In standard methods for solving monotone 
inclusions, the parameter X{t) (Afc in the discrete algorithmic case) is prescribed to stay 
bounded away from zero and infinity. By contrast, our strategy is to let X{t) tend to +oo as 
t + 00 . This will be a crucial ingredient for obtaining fast convergence properties. But the 
precise tuning of A(-) in such an open-loop way is a difficult task, and the open-loop approach 
raises numerical difficulties. Instead, we consider the following system © with variables 
(x,A), where the tuning is done in a closed-loop way via the second equation of © (0 is a 
fixed positive parameter): 

( x{t) -|- x{t) — {I + X{t)A)~^x{t) = 0, X{t) > 0, 

(LSP) (4) 

[ X{tmi + X{t)A)-^x{t)-x{t)\\=9. 

Note that A(-) is an unknown function, which is obtained by solving this system. When the 
system is asymptotically stabilized, i.e., x{t) — )• 0, then the second equation of (jl]) forces 
X{t) = to tend to -|-oo as t —)• -|-oo. Our main results can be summarized as follows: 

In Theorem 12.41 we show that, for any given xq G IK \ H“^(0), and 9 > 0, there exists a 
unique strong (locally Lipschitz in time) global solution 1 1 —(x(t), A(t)) of 0 which satisfies 
the Cauchy data x(0) = xq. 

In Theorem 13.21 we study the asymptotic behaviour of the orbits of 0, as t —)• -|-oo. As¬ 
suming A“^(0) / 0, we show that for any orbit 1 1 -)- {X{t),x(t)) of 0, X{t) tends increasingly 
to -|-oo, and u> — limt_^+oo 3:(t) = Xqo exists, for some Xoo G A“^(0). We complete these results 
by showing in Theorem 13.51 the strong convergence of the trajectories under certain additional 
properties, and in Theorem 13.31 sunerlinear convergence under an error bound assumption. 

In Theorem l4.21 we show that © has a natural link with the regularized Newton dynamic, 
which was introduced in [5]. In fact, X{t) tends to -|-oo as t —)■ -|-oo is equivalent to the 
convergence to zero of the coefficient of the regularization term (Levenberg-Marquardt type) 
in the regularized Newton dynamic. Thus 0 is likely to share some of the nice convergence 
properties of the Newton method. 
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In Theorem 15.61 when A = df is the sub differential of a convex lower semicontinuous 
proper function / : IK —)• M U {+oo}, we show the 0(l/t^) convergence property 


/(x(t)) - inf / < 


Cl 

(1 + C2t)2- 


In Appendix lA.21 we consider some situations where an explicit computation of the continuous 
orbits can be made, and so confirm the theoretical results. 

Then, we present new algorithms which can be obtained by time discretization of ([1]), 
and which share similar fast convergence properties. We study the iteration complexity of a 
variant of the proximal point method for optimization. Its main distinctive features are: 

i) a relative error tolerance for the solution of the proximal subproblem similar to the ones 
proposed in mED], see also [3] in the context of semi-algebraic and tame optimization; 

ii) a large step condition, as proposed in [nids]. Let us notice that the usefulness of 
letting the parameter tends to infinity in the case of the proximal algorithm, was already 
noticed by Rockafellar in |18] (in the case of a strongly monotone operator, he showed a 
superlinear convergence property). 

Cubic-regularized Newton method was hrst proposed in [9] and, after that, in [2T]. As a 
main result, in Theorem 16.41 we show that the complexity of our method is 0(I/n^), the same 
as the one of the cubic-regularized Newton method |14] . 

For smooth convex optimization we introduce a corresponding proximal-Newton method, 
which has rapid global convergence properties (Theorem 17. 5 j) . and has quadratic convergence 
in the regular case iTheorem 17.61) . 


1 Study of the algebraic relationship linking A and x 


Let us fix 0 > 0 a positive parameter. We start by analyzing the algebraic relationship 

X\\{I + XA)-^x-x\\=9, (5) 

that links variables A €]0, -|-oo[ and x € TC in the second equation of (jj]). Define 

ip : [0, oo[xlK —)■ 99(A, x) = A||x — (/-|-AA)“^x|| for A > 0, (/3(0,x)=0. (6) 

We denote by = {I + AA)“^ the resolvent of index A > 0 of A, and by = ^ (/ — J^) 
its Yosida approximation of index A > 0. To analyze the dependence of p with respect to A 
and X, we recall some classical facts concerning resolvents of maximal monotone operators. 


Proposition 1.1. For any A > 0, // > 0, and any x G IK, the following properties hold: 


i) JA : IK —)• IK is nonexpansive, and Ax : IK 

ii) J^x = (^jx J/^x) ; 

Hi) \\JxX — J^x\\ < |A — /r| ||Aax||; 


IK is Y 

A 


iv) lim Jxx = projp^x; 


A->-0 


;) lim J^x = projA-i(o)a:, if A AO) 7^ 0- 
A->-|-oo ^ ' 


Lipschitz continuous. (7) 

( 8 ) 

(9) 

( 10 ) 

( 11 ) 


3 






As a consequence, for any x G and any 0 < 6 < A < +oo, the funetion A i—)• J^x is 
Lipschitz continuous on [<5, A], More preeisely, for any \,pL belonging to [<5, A] 

\\J\X - J^:x\\ < |A - /i| ||A5 x||. (12) 

Proof, i) is a classical result, see [3 Proposition 2.2, 2.6]. 

ii) Equality ([8]) is known as the resolvent equation, see [?]• Its proof is straightforward: By 
definition of = J^x, we have 

f, + 3 X, 

which, after multiplication by gives 

yC + 3 A®- 

By adding ^ to the two members of the above equality, we obtain 

f 3 jx — + ^, 

which gives the desired equality 

in) For any A > 0, /r > 0, and any x G IK, by using successively the resolvent equation and 
the nonexpansive property of the resolvents, we have 

\\JxX - J^x\\ = \\J^ (^JX + (l - 0 ‘^xx'j - J^x\\ 

< II (l - {x- Jtx) II 

< |A - /r| ||Aax||. 

Using that A i—)• ||Aax|| is nonincreasing, (see [3 Proposition 2.6]), we obtain (fT^ . 

iv) see [3 Theorem 2.2]. 

v) It is the viscosity selection property of the Tikhonov approximation, see m- □ 

Let us first consider the mapping x y?(A,x). Noticing that, for A > 0, ip{X,x) = 
A^||Aax||, the following result is just the reformulation in terms of ip of the ^-Lipschitz con¬ 
tinuity of Aa. 

Proposition 1.2. For any xi,X 2 G AC and A > 0, 

\p{X,xi) - p{X,X2)\ < A||x 2 - Xl||. 

The next result was proved in m Lemma 4.3] for finite dimensional spaces. Its proof for 
arbitrary Hilbert spaces is similar and is provided for the sake of completeness. 

Lemma 1.3. For any x G TC and 0 < Ai < A 2 , 

^(/?(Ai,x) < vj(A 2 ,x) < p{Xi,x) (13) 

and p{Xi,x) = 0 if and only if 0 G A(x). 
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Proof. Let yi = J^x and Vi = A\.x for z = 1, 2. In view of these definitions, 

Vi&A{yi), XiVi + yi-x = 0 i = l,2. 


Therefore, 


Xi{vi - V2) + yi - y2 = {>^2 - h)v2, V2-V1 + X2 Hy2-yi) = (Ai ^ - A2 ^){yi-x). 


Since A is monotone, the inner products of both sides of the first equation by vi — V 2 and of 
the second equation by 2/2 — Vi are non-negative. Since Ai < A 2 , 


(ni - ^ 2 ,^ 2 ) > 0, ( 2/2 - yi,yi - a;) > 0, ||ni|| > ||'(; 2 ||, II 2/2 - 3:|| > II 2/1 - a;||. 

The two inequalities in (1131) follow from the two last inequalities in the above equation and 
definition ([U|). The last part of the proposition follows trivially from the maximal monotonicity 
of A and definition (j6|). □ 

We can now analyze the properties of the mapping A e->■ (p{X,x). Without ambiguity, we 
write shortly J\ for the resolvent of index A > 0 of 2 I. 

Proposition 1.4. For any x ^ ^“^(0), the function X € [0,oo[ 1 -^ ip{X,x) G M"'' is continuous, 
strictly increasing, ip{Q,x) = 0, and lim;,_^_|_oo 9 J(A, x) = -|-oo. 

Proof. It follows from Q and the first inequality in (|13p with A 2 = 1, A = Ai < 1 that 


0 < lim sup ip{X,x) < lim A</?(l,a;) = 0, 
A^o+ ^^ 0 + 


which proves continuity of A 1 —)■ (/j(A, x) at A = 0. Note that this also results from Proposition 
11.11 iv). Since 0 ^ A[x), it follows from the last statement in Lemma 11.31 and the first 
inequality in (fT^ that A 1 —)• ip{X,x) is strictly increasing, and that limA->.cx) ¥^(A, x) = -|-oo. 
Left-continuity and right-continuity of A 1 —)• ^p{X, x) follows from the first and the second 
inequality in (fT^ . □ 

In view of Proposition 11.41 if 0 ^ A{x) there exists a unique A > 0 such that (/?(A,x) = 9. 
It remains to analyze how such a A depends on x. Define, for 9 > 0 


Q = J{\A-^0), 

Ae : Q -^]0, oo[, Ae{x) = {ip{-,x))~^ (9). 


(14) 


Observe that D is open. More precisely. 


z 


z — X < 


Affix) 


C D, 


Vx G D. 


(15) 


To prove this inclusion, suppose that || 2 ; — x|| < 9/Aff{x). By the triangle inequality and 
Proposition 11.21 we have 

(p{Ag{x), z) > iplAfflx),x) - \g:>{Ag{x), z) - (p(Aff(x), x)l >9- Afflx)\\z - x|| > 0. 

Hence, z ^ M“^(0). 
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Function Ag allows us to express Q as an autonomous EDO: 


x{t) + x{t) - {I + A 0 {x{t))A) ^x(t) = 0; 

< 

x(0) = Xq. 


(16) 


In order to study the properties of the function Ag, it is convenient to define 

rg(x) = min{a > 0 | ||x — (/ + a~~^A)~~^x\\ < aO}. (17) 

Lemma 1.5. The function Fg : Ff —)• M"*" is Lipschitz continuous with constant 1/0 and 


rg(x) 


1/Ag(x), if X £ 

0, otherwise 


Proof. The first inequality in (fT^ is equivalent to saying that A i—||x — (I + AA)“^x|| is a 
non-decreasing function. Therefore, a e->■ ||x —(/-|-Q;“^A)“^a:|| is a (continuous) non-increasing 
function. As a consequence, the set 

{a > 0 I llx — (7 -|- a~^A)~^x\\ < aO} 


is always a nonempty interval, and Fg is a real-valued non-negative function. The relationship 
between Fg(x) and Aq{x) is straightforward: by definition, if x G D 

Fg(x) = min{a > 0 | —||x — (7 -|- —A)“^x|| < 9}, 

OL OL 

1 

sup{A I A||x — (7 -|- AA)“^x|| < 9} ’ 

= 1/Ag(x). 

Moreover, if x G S', then for any a > 0, x — (7 -|- a~^A)~^x = 0, and Fg(x) = 0. 

Let us now show that Fg is Lipschitz continuous. Take x,?/ G TC and a > 0. Suppose that 
||x—(7-|-a“^A)“^x|| < aO. We use that x >->■ ||x—(7-|-AA)“^x|| is nonexpansive (a consequence 
of the equality ||x — (7 -|- AA)“^x|| = ||AAax|| and Proposition ll.il item i)). Hence 

\\y - {I + a~^A)~^y\\ < ||x - {I + a~^A)~^x\\ + \\y - x|| 

< aO + \\y — x|| 

Let fi = a + \\y — x\\/9. Since /3 > a, by using again that A i—||x — (7 -|- A“^A)“^x|| is a 
non-increasing function, 

\\y -{1 + l3~^A)-^y\\ < \\y - {I + a~^A)-^y\\ < /39. 

By definition of Fg, we deduce that Fg(y) < /3 = a + \\y — x\\/9. This being true for any 
« > rg(x), it follows that Fg(y) < Fg(x) -|- \\y — x\\/9. Since the same inequality holds by 
interchanging x with y, we conclude that Fg is 1/0-Lipschitz continuous. □ 
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Observe that in Q 


X{t) = Ae(x(t)), x{t) = JKe{x(t))x{t) - x{t). 
We are led to study the vector held F governing this EDO, 

F : Q ^ Ji, F{x) = JAg{x)X — X. 
Proposition 1.6. The vector field F is locally Lipschitz continuous. 


(18) 


Proof. Take xq G D and 0 < r < 6/Kq{xq). Set Aq = A 0 (xo). By (fTSl) we have B{xo,r) C D. 
In view of the choice of r and Lemma ll.5l for any x G B{xq, r) 




(19) 


Take x,y G B{xo,r) and let 


X = A0{x), n = A0{y). 


By using that x ||x — TA(ic)|| is nonexpansive, and the resolvent equation (Proposition [LT 
item in)), we have 

||F(x) - F{y)\\ = \\Jxx - X - {Jf,y - y) || 

< IIJax - X - (Ja^ - 2 /) II + \\Jxy - J^y\\ 

\\Juy-y\\ 


< \\x-y\\ + |A-/r|- 
|A-/i| 


A 


= \\x - y\\ + 


e 




where the last equality follows from the dehnition of fj, and (1141) . Using Lemma 11.51 we have 


IA-//I A 
T 




1 1 

/i A 




r 0 (x) 


OPeix) 


— X 


In view of (fT^ . 


Pejy) ^ ^ +Apr 

r 0 (x) ~ 9 - Apr’ 

Combining the three above results, we conclude that 

9 + Apr 


||T(x)-F(y)||< 
which is the desired result. 


1 + 


9 - Apr 


x-y\\ = 


29 


9 - Apr 


X — ' 


□ 
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2 Existence and uniqueness of a global solution 


Given xq G \ ^ ^(0), we study the Cauchy problem 

x{t) + x{t) — (/ + A(t)^)“^x(t) = 0, X{t) > 0, 

< A(t)||(/+ A(t)A)“^x(t) - x(t)|| = 0, (20) 

^ x(0) = Xq. 


Note that the assumption xq G A2 = tK \ ^ ^(0) is not restrictive, since when xq G ^ ^(0); 
the problem is already solved. Following the results of the previous section, (I20|) can be 
equivalently formulated as an autonomous EDO, with respect to the unknown function x. 


x{t) + x{t) - {I + Ag{x{t))A) ^ x(t) = 0; 
x(0) = Xq. 


( 21 ) 


Let us first state a local existence result. 


Proposition 2.1. For any xq G D = \ ^“^(0), there exists some e > 0 such that (j20p 

has a unique solution (x,A) : [0,e] —)• Jf x M++. Equivalently, ([ 2 T]) has a unique solution 
X : [0,e] ffC. For this solution, x(-) is and A(-) is locally Lipschitz continuous. 

Proof. We use the reformulation of (j 20 l) as an autonomous differential equation, as described 
in (f^ . Equivalently 

x{t) = F{x{t)), 

with F{x) as in (jlSp . By Proposition 11.61 the vector field F is locally Lipschitz continuous 
on the open set D C Hf. Hence, by Cauchy-Lipschitz theorem (local version), for any xq G D, 
there exists a unique local solution x : [ 0 ,e] —>■ IK of (fTHp . for some e > 0 . Equivalently, 
there exists a unique local solution (x,A) of dH). Clearly x is a classical orbit, and 
t ^ \{t) = Ke{x{t)) = is Lipschitz continuous (by taking e sufficiently small), a 

consequence of Lemma 11.51 and x{t) G D. □ 


In order to pass from a local to a global solution, we first establish some further properties 
of the map 1 1 -)- A(t). 

Lemma 2.2. If {x,X) : [0,e] —>■ IK x M++ is a solution of (I20p . then |A(t)| < X{t) for almost 
all t G [0, e]. 


Proof. Take t, G [0, e], t 7 ^ t'. Then 

|A(t') - A(t)| = X{t)X{t' 


X{t) A(P) 

= A(t)A(O|r0(x(t))-r,(x(O)| 

A(t)A(t')||x(t) — x'( 


< 


e 


where the last inequality follows from Lemma 11.51 Therefore 


lim sup 


m - m 


t' -t 


< lim 
t'^t 


X{t)X{t')\\x{t') - x{t)\\ 
e\t' -t\ 


= x{tnx{t)\\/e = x{t). 


( 22 ) 

(23) 

(24) 


(25) 


□ 













Lemma 2.3. If {x,X) : [0,e] ^ !K x M+_|_ is a solution of (I20p . then A(-) is non-decreasing. 

Proof. Since A is locally Lipschitz continuous, to prove that it is non-decreasing it suffices to 
show that A(t) > 0 for almost all t G [0, e]. Take t G [0, and define 

At = A(t), y = Jf,x{t), V = n~^{x{t) -y). 

Observe that v G A{y) and yv + y — x{t) = 0. Dehne 

Zh = x{t) + hx{t), 0 < h < min{e — t, 1 }. 

Since x{t) = —yv, we have (1 — h)y,v + y — Zh = 0, J{\-h)^iZh = V and so 

V?((l - h)n,Zh) = (1 - h)n\\y - Zh\\ = (1 - h)'^fi\\y - x(t)|| = (1 - hfO. 

Therefore, using triangle inequality, the second inequality in Lemma 11.31 and Proposition 11.21 
we obtain 

x{t + h)) < Zh) + x(t + h)) - Zh)\ 

^ <y?((l - h)y,Zh) , 1,^,^ , II 

— (1 _ /i)2 ^ + h) — Zh\\ 

= 6 + y-\\x{t + h) — x(t) — hx{t)\\. 

To simplify the notation, define 

fj,\\x{t h) — x{t) — hx{t)\\ 

Ph = ^ ■ 

Observe that Ph > 0 (for 0 < h < min{e — t, 1}), and lim;j_^Q+ ph/h = 0. Now, the above 
inequality can be written as 


p{p,x{t + h)) < e{i + Ph). 

It follows from this inequality, the non-negativity of ph and Lemma 11.31 that 

P 




, x{t h) \ <6 


A P Ph 

Since p{-,x{t -|- h)) is strictly increasing, and p{X{t -|- h),x{t + h)) = 9, 

P A(t) 


X{t h) > 


1 + Pft I + Ph 


Therefore 


lim inf 
h—>-0+ 


X{t h) — X{t) 
h 


> lim — 
h^0+ h 


m 

^ + Ph 



= — lim X{t) = 0. 

h^0+ 1 + Ph 


□ 

In view of Proposition 12.11 there exists a solution of (1211) defined on a maximal interval. 
Next we will prove that this maximal interval is [0, -|-oo[. 
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Theorem 2.4. For any xq ^ Vl = ‘K \ A“^(0), there exists a unique global solution (x,A) : 
[0,+oo[—34 X M++ of the Cauchy problem (|20|) . Equivalently, (f^ has a unique solution 
X : [0,+oo[—>• IK. For this solution, x{-) is , and A(-) is locally Lipschitz continuous. 
Moreover, 

i) A(-) is non-decreasing; 

a) t !-)■ II — x(t)|| is non-increasing; 

Hi) For any 0 < to < ti 

A(to) < A(ti) < e(^i"^°)A(to) 

ll«/A(to)a;(io) - x(to)||e“^‘i“*°^ < \\Jx(tj^)x{ti) - x{ti)\\ < \\Jx{to)x{to) - x(to)||. 

Proof. According to a standard argument, we argue by contradiction and assume that the 
maximum solution x(-) of (j21l) is defined on an interval [0,Tmaa;[ with T^ax < +oo. By 
Lemmas 12.21 and 12.31 A(-) is non-decreasing, and satisfies 0 < A(t) < X{t) for almost all 
t G [0,Tmax[- By integration of this inequation, we obtain, for any t G [0,Tmax[ 

0 < A(0) < X{t) < A(0)e^ (26) 

Since t < Tmax, we infer that limt^'r^^^ X{t) := Xm exists and is finite. Moreover, by (1^ 

\\x{t)\\ = !!(/ + X{t)A)-^x{t) - x(t)|| = (27) 

Combining (|26|) and (1271) . we obtain that ||x(t)|| stays bounded when t G [0,Tmax[- By a 
classical argument, this implies that limt^'r^^^ x{t) := Xm exists. 

Moreover, by the second inequality in (|26|) . \\{IX{t)A)~^x{t) — x{t)\\ = stays bounded 
away from zero. Hence, at the limit, we have ||(/ -|- XmA)~^Xm — Xm\\ 7^ 0, which means that 
Xm G = 37 \ A“^(0). Thus, we can apply again the local existence result. Proposition 
EU with Cauchy data Xm, and so obtain a solution defined on an interval strictly larger 
than [0,Tmax[- This is a clear contradiction. Properties i),ii),iii) are direct consequence of 
Lemmas 12.21 and [2.31 More precisely, by integration of 0 < A(t) < A(t) between to and ti > toj 
we obtain A(to) < -^(H) < e(*i“*°)A(to)- As a consequence 

9 9 

l|7A(ti)3;(ii) - a;(fi)|| = = l|7A(io)a;(io) - x{to)\\, 

and 

II - x(*OII = am = X ^ £ II 

Remark 2.5. Property Hi) of Theorem 12.41 with to = 0, namely || Ja(o)^o ~ 3 :o||e“^ < 
\\Jx{t)x{t) — x(t)||, implies that for all t > 0, we have J\(t)x{t) — x{t) 0. Equivalently 
x{t) ^ A“^(0), i.e., the system cannot be stabilized in a finite time. Stabilization can be 
achieved only asymptotically, which is the subject of the next section. 

□ 
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3 Asymptotic behavior 

3.1 Weak convergence 

To prove the weak convergence of trajectories of system ([4|), we use the classical Opial lemma 
m. that we recall in its continuous form; see also [8], who initiated the use of this argument 
to analyze the asymptotic convergence of nonlinear contraction semigroups in Hilbert spaces. 

Lemma 3.1. Let S be a non empty subset of Vi, and x : [0,+oo[—IK a map. Assume that 

(i) for every z £ S, lim ||x(t) — z|| exists] 

t^+oo 

(ii) every weak sequential cluster point of the map x belongs to S. 

Then 

w — lim x{t) = Xoo exists, for some elementXoo £ S. 

t —^~1“00 

Let us state our main convergence result. 

Theorem 3.2. Suppose that / 0. Given xq ^ let (x, A) : [0, +oo[—>■ Ji x M++ 

be the unique global solution of the Cauchy problem (120]). Set do = d{xo,A ^(0)) the distance 
from xq to H“^(0). Then, the following properties hold: 

i) II^WII = Ik(^) - J\{t)x{t)\\ < do/V^; hence limj^+oo ||l:(i)|| = 0; 

ii) X{t) > OV^/do; hence limt^+oo A(t) = +oo; 

Hi) w — limi_s.+oo x{t) = Xoo exists, for some Xoo £ 

Moreover, for any z £ ||x(t) — z|| is decreasing. 

Proof. Define 

v{t) = X{t)-^{x{t) - Jx(^t)x{t)). (28) 

Observe that v{t) £ A{Jx(^t)xit)) aiid X{t)v{t) + J\[t)x{t) — x{t) = 0. For any z £ and 

any t > 0 set 

h;,{t) := ^\\x{t) - zW"^. (29) 

After derivation of h^, and using the differential relation in (1201) we obtain 

hz{t) = {x{t) - z,x{t)) (30) 

= -{x{t) - z,x{t) - Jx{t)x{t)) = -||x(t) - Jx{t)x{t)f - {Jx{t)x{t) - z,X{t)v{t)). (31) 

Since v{t) £ A{,Jx{t)x{t)), 0 G A{z), and A is (maximal) monotone 

hz{t) < -\\x(t) - Jx{t)x{t)f. (32) 

Hence, hz is non-increasing. Moreover, by integration of ()32p . for any t > 0 

hz{u)du 

J\{u)x{u) - x{u)fdu > t\\Jx(^t)x{t) - x{t)f 



1 /"* 

-\\z - x(0)|p > hziO) - h^{t) = - 
z Jo 


11 


where the last inequality follows from t (-)• || x(t)|| being non-increasing (see Theorem 

12.41 ii)). Item i) follows trivially from the above inequality. Item ii) follows from item i) and 
the algebraic relation between x and A in (j20l) . To prove item in), we use Lemma l3.ll with 
S = ^“^(0). Since z in ([2^ is a generic element of it follows from (1321) that item (i) 

of Lemma O holds. Let us now prove that item (ii) of Lemma [3T] also holds. Let x^o be 
a weak sequential cluster point of the orbit x(-). Since \\x{t) — Jx(t)x{t)\\ —^ 0 as t —>■ oo, we 
also have that Xoo is a weak sequential cluster point of Now observe that in view of 

items i) and ii), for any t > 0 

IM*)II < (33) 

Hence, v{t) converges strongly to zero as t tends to infinity. Since v{t) G A{Jx{t)x{t)), and 
the graph of A is demi-closed, we obtain 0 G H(a;oo), i-e., Xoo & S. □ 

3.2 Superlinear convergence under an error bound assumption 

In this section, we assume that the solution set S = is non-empty and that, whenever 

V G A[x) is “small”, its norm provides a bound for the distance of x to S. Precisely, 

AO) S = A ^(0) is non-empty, and there exists e,K > 0 such that 

V G A{x), ||u|| < £ => d{x,S) < k||u||. 

Theorem 3.3. Assuming AO), then x{t) converges strongly to some x* G A“^(0), and for 
any a G (0,1) there exist positive reals cq, ci, C2, C3 such that 

d{x{t),S) < coe““*, X{t) > cie“*, ||u(t)|| < C2e“^“*, ||x(t) — x*|| < 036“"*. 

Proof. Let Pg be the projection on the closed convex set S = A“^(0). Define, for t > 0, 

x*{t) = Ps{x{t)), y*{t) = Psivit)). 

It follows from the assumption A“^(0) 7^ 0, and from (1331) (inside the proof of Theorem 13.2|) 
that limt^oo^’(0 = 0. By AO), and v{t) = \{t)~^{x{t) — y{t)) G A{y{t)), we have that, for t 
large enough, say t > to 


d{y{t),S) = ||y(t) - y*{t)\\ < K||u(t)||. (34) 

Hence 

\\x{t) - x*{t)\\ < ||x(t) - y*(t)|| < ||x(t) - y{t)\\ + ||y(t) - y*{t)\\ 

< lk(i) - yit)\\ + K||u(t)|| 

Take a G (0,1). Since X{t) 00 as t ^ 00, for t large enough 

\\x{t) - x*{t)\\ < a"^||3:(t) - y{t)\\. (35) 
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Define 


S{t) ■■= ^d^ix{t),S) = ^\\x{t) 

Using successively the classical derivation chain rule, and (j4]), we obtain 

g'{t) = {x{t) - x*{t),x{t)) 

= - {x{t) - x*{t),x{t) - y{t)) 

= -||x(t) - y{t)f - {y{t) - x*{t),x{t) - y{t)). 

By the monotonicity of A, and X{t)~^{x{t) — y{t)) € A{y{t)), 0 € A{x*{t)), we have 

{y{t) - x*{t),x{t) - y{t)) > 0. 


Combining the two above inequalities, we obtain 

g'{t) <-\\x{t) - y{t)f. (36) 

From (j35p . (1361) . and the definition of 5, we infer 

g'{t) < -2a^g{t), 

and it follows from Gronwall’s lemma that g{t) < which proves the first inequality. 

To prove the second inequality, we use the inequality 


x{t) - y{t)\\ < d{x{t),S) 


(37) 


which is a direct consequence of the -^-Lipschitz continuity of A\. For z & S, since = 0 


I^A(t)a;(t)ll = pA(t)a;(t) - Ax{^t)z\\ < ;^lk(i) - ^^11- 


Equivalently, \\x{t) — y(t)|| < ||a:(t) — z\\ for all z £ S, which gives (l37)l . Then use the hrst 
inequality, and the equality A(t)||x(t) — y(t)\\ = 6, and so obtain the second inequality. 

The third inequality follows from the second one, and the equality A(t)^||u(t)|| = 6. 

To prove the last inequality, observe that for ti < t 2 , 


rt2 rt2 rt2 

3:(t2) — ic(ti)|| < / \\x{t)\\dt = / \\x{t) — y{t)\\dt < / d{x{t),S)dt 

1/ i 1 J ^1 J i 1 


where the last inequality comes from (I37h . and the strong convergence of x{t), as well as the 
last inequality follows. □ 


Remark 3.4. In the Appendix, in the case of an isotropic linear monotone operator, we can 
perform an explicit computaion of x, A, and observe that their rate of convergence are in 
accordance with the conclusions of Theorem ESI 
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3.3 Weak versus strong convergence 

A famous counterexample due to Baillon [6] shows that the trajectories of the steepest descent 
dynamical system associated to a convex potential can converge weakly but not strongly. The 
existence of such a counterexample for Q is an interesting open question, whose study goes 
beyond this work. In the following theorem, we provide some practically important situations 
where the strong convergence holds for system (j^ . 

Theorem 3.5. Assuming S = A“^(0) is non-empty, then x{t) converges strongly to some 
X* G in the following situations: 

i) A is strongly monotone; 

ii) A = df, where / : IK —)• M U +{oo} is a proper closed convex function, which is 
boundedly inf-compact; 

in) S = A“^(0) has a nonempty interior. 

Proof i) If A~^ is Lipschitz continuous at 0, then assumption AO) holds, and, by Theorem 
13.31 each trajectory x{t) of (j4|) converges strongly to some x* G A“^(0). In particular, if A is 
strongly monotone, i.e., there exists a positive constant a such that for any yi G Axt, i = 1,2 

(2/2 - yi,X2 - Xi) > a\\x 2 - Xi\\^, 

then A~^ is Lipschitz continuous. In that case, A“^(0) is reduced to a single element z, and 
each trajectory x{t) of (jl]) converges strongly to z, with the rate of convergence given by 
Theorem 13.31 

ii) A = df, where / : IK —>• M U +{00} is a proper closed convex function, which is 
supposed to be boundedly inf-compact, i.e., for any R > 0 and I G M, 

{x G TC : f{x)<l, and ||x|| < i?} is relatively compact in IK. 

By Corollary 15.31 t f{x{t)) is non-increasing, and a:(-) is contained in a sublevel set of /. 
Thus, the orbit x(-) is relatively compact, and converges weakly. Hence, it converges strongly. 

Hi) Suppose now that S = A“^(0) has a nonempty interior. Then there r > 0 and 
p G A“^(0) such that the ball B{p,r) of radius r centered at p is contained in S. For any 
given A > 0, we have A“^(0) = Al^^(O). Hence, for any A > 0, we have B{p,r) C Al^^(O). By 
the monotonicity property of Ax, for any ^ G !K, A > 0, and h G IK with ||/i|| < 1, 

{Ax{f),f.- {p + rh)) > 0. 

Hence 

^’Pa(0II =r sup {Ax{f),h)) < {Ax{^),f - p)). (38) 

The edo ([H) can be written as x{t) -\- X{t)Ax;t)x{t) = 0. Taking A = X{t), and = x{t) in (f38]) . 
we obtain 

11^(011 = A(t)||A;,(i)(x(t))|| < ^ {Ax^t)ix{t)),x{t) -p)) 

Using again (jj]) we obtain 

\\x{t)\\ <-^{x{t),x(t)-p)) . (39) 
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The end of the proof follows standard arguments, see for example |16l Proposition 60]. In¬ 
equality (13^ implies, for any 0 < s < t 


|x(t) — x(s)|| < / ||a;(T)||(iT 


1 /■* 

<— / {x{t),x{t) — p) dr 
r Js 

< ^{\\x{s)-pf - \\x{t)-pf). 


By Theorem 13.21 ini. \\x{t) — p\\ is convergent. As a consequence, the trajectory x(-) has the 
Cauchy property in the Hilbert space TC, and hence converges strongly. □ 


4 A link with the regularized Newton system 

In this section, we show how the dynamical system ([1|) is linked with the regularized Newton 
system proposed and analyzed in ID, in, m- Given xq ^ A ^(0), let {x,X) : [0,-|-oo[—)• 
TC X M_|_+ be the unique global solution of the Cauchy problem ([20]). For any t > 0 dehne 

y{t) = (/ + X{t)A)-^x{t), v{t) = - y{t)). (40) 

We are going to show that y(-) is solution of a regularized Newton system. For proving this 
result, we first establish some further properties satisfied by y{-). 

Proposition 4.1. For y{-) and v{-) as defined in ()40p it holds that 

i) v{t) G Ay{t), X{t)v{t) -|- y{t) — x{t) = 0, and x{t) = y{t) — x{t) for all t > 0; 

ii) v{-) and y{-) are locally Lipschitz continuous; 

in) y{t) + X{t)v{t) + (A(t) -|- X(t))v{t) = 0 for almost all t > 0; 

iv) {y(t),v(t)) > 0 and {y{t),v(t)) < 0 for almost all t > 0; 

v) ||u(-)|| is non-increasing. 

Proof Item i) follows trivially from (I40h and ([1|). Item ii) follows from the local Lipschitz 
continuity of A, and the properties of the resolvent, see Proposition II.II Hence x,y,X,v are 
differentiable almost everywhere. By differentiating Xv y — x = 0, and using x = y — x, we 
obtain item in). To prove item iv), assume that y and v are differentiable at t > 0. It follows 
from the monotonicity of A and the first relation in item iv) that ii t' t and t' > 0 

{y{t') - y{t),v{t') - v{t)) 

(P - t)2 

Passing to the limit as —)• t in the above inequality, we conclude that the hrst inequality in 
item iv) holds. To prove the last inequality, assume that A(-) is also differentiable at t. Using 
item Hi), after scalar multiplication by y{t), we obtain 

\\y{t)f X{t){y{t),v{t)) + (A(t) X{t)){y{t),v{t)) = 0. 

To end the proof of item iv), note that X{t) > 0 (by Theorem 12.41 ii), A(-) is non-decreasing), 
and use the first inequality of item iv). In view of ()4U|1 and (Hj), A^(t)||u(t)|| = 0 for all t >0. 
This result, together with Lemma 12.31 proves item v). □ 
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(41) 


Hence (almost everywhere) y(-) and f (•), as defined in (1401) . satisfy the differential inclusion 

v{t) G Ay{t)-, 

y{t) + X{t)v{t) + (A(t) + X{t))v{t) = 0. 

Recall that A(-) is locally absolutely continuous, and satisfies almost everywhere 

0 < X{t) < X{t). 

Let us consider the time rescaling defined by 


r{t) = f du = t + ln(A(f)/A(0)). 

Jo A(u) 


(42) 


Since 1 < < 2, we have t < T{t) < 2t. Hence t e-)- r(t) is a monotone function 

A(u) 

which increases from 0 to +oo as t grows from 0 to +oo. The link with the regularized Newton 
system is made precise in the following statement. 

Theorem 4.2. Fory{-) andv{-) as defined in (HOl) . let us set y{t) = y{T{t)), v{t) = u(r(t)), 
where the time rescaling is given by T{t) = f* Then, (y,v) is solution of the 


regularized Newton system 


X{u) 





(43) 


That’s the regularized Newton system which has been studied in [5]. The (Levenberg- 
Marquardt) regularization parameter is equal to Since X{t) tends to infinity, the 

regularization parameter converges to zero as r tends to infinity. This makes our system 
asymptotically close to the Newton method. We may expect fast convergence properties. 
That’s precisely the subject of the next section. Let us complete this section with the following 
relation allowing to recover x from y. 


Lemma 4.3. For any t 2 > ti > 0 

fAt 


x{t2)= / [{l-e~^^)y{ti + u) + e~^^x{ti)] 

Jo 


.At _ I 


du. 


where At = t 2 — ti. 

Proof. It suffices to prove the equality for H = 0 and t 2 = t = At. Since x = y — x, trivially 
X + X = y. So 

e^xft) — xq = / e^y{u) du. 

Jo 

Whence 

x{t) = e~^XQ + / e^y{u) du 

Jo 

1 pi Uj 

= e"‘ J e“ [(e* - l)y{u) + xq] ^ ^ J “ ®~*)y(“) + 

which is the desired equality. □ 
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5 The sub differential case 


From now on, in this section, we assume that A = df, where / : Ft —)• M U + {00} is a proper 
closed convex function. Let us recall the generalized derivation chain rule from Brezis [ 7 ] that 
will be useful: 

Lemma 5.1. [TJ Lemme 4 , p. 73 ] Let $ : Ft —)• MU {+00} be a closed convex proper function. 
Let u G L^( 0 ,T;FC) he such that ii G L^( 0 ,r;FC), and u{t) G dom{d^) for a.e. t. Assume 
that there exists ^ G L^( 0 ,T;FC) such that f,{t) G 9 <f>(n(t)) for a.e. t. Then the function 
t 4 >(u(t)) is absolutely continuous, and for every t such that u and $(u) are differentiable 
at t, and u{t) G dom{d^), we have 

V/i G d<^{u{t)), ^ 4 >(u(t)) = {u{t), h). 

5.1 Minimizing property 

Since v{t) G df{y{f)), \{t)v{t) = x{f)—y{f), and A(t)^||u(t)|| = 6 , by the convex subdifferential 
inequality 

+ {x(t) - s{t),v{t)) > f(y(t)) + A(i)||ii(«)||‘’ 

= /(!/(()) + \/9|l''(*)f'"- (44) 

Lemma 5.2. The function t 1—)• f{y{t)) is locally Lipschitz continuous, non-increasing and 
for any t2 > h > 0 , 

/■At u 

f{x{t2))< j [(1 - e-^*)f{y{ti + u)) + e-^^f{x{ti))] du (45) 

< (1 - e-^^)f{y{t,)) + e-^V(x(ti)) (46) 

where At = t2 — ti. 

Proof. Suppose that t2,ti > 0 , / t2 and let 

yi = y{h), vi = v{ti), y2 = y{t2), V2 = v{t2). 

Since Vi G df{yi) for z = 1,2 

7(2/2) > /(2/1) + (2/2 - yi,vi), f{yi) > f{y2) + (2/1 - y2,V2). 


Therefore 


(2/2 - yi,vi) < f{y 2 ) - f{yi) < (2/2 - yi,v 2 ) 


and 

\f{yi) - /(2/2)| < II2/1 - 2/2II max{||ui||, ||u2||} < \\yi - 2/2||||i^(0)|| 


where in the last inequality, we use that ||u(-)|| is non-increasing, (see Proposition 14.11 item 
v)). Since t i-/ y{t) is locally Lipschitz continuous, t e-)- f{y{t)) is also locally Lipschitz 
continuous on [0, oo[. Moreover, t 1—>■ f{y{t)) is differentiable almost everywhere. Since y is 
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locally Lipschitz continuous, and n(-) is bounded, by Lemma [5.11 the derivation chain rule 
holds true (indeed, it provides another proof of the absolute continuity of t f{y{t))). Hence 

where in the last inequality, we use Proposition 14.11 item iv). Hence t i—)• f{y{t)) is locally 
Lipschitz continuous, and non-increasing. Let us now prove inequality (j45p . Without any 


restriction we can take = 0 and t 2 = t = At. By Lemma 14.31 

x{t) = j [(1 - e~^)y{u) + e“*xo] du. (47) 

The conclusion follows from the convexity of /, Jensen’s inequality, and t i—>• f{y{t)) non¬ 
increasing. □ 

Corollary 5.3. If f{x{0)) < -|-oo, then for any t>0, we have 

*) < Too, (48) 

a) 1 1—>■ f{x{t)) is non-increasing, (49) 

Hi) limsup -^^"^^^ + ^^^~^^^^^^^ < fiyit)) - fixit)) < -^/0||n(^)fZ^. (50) 

h^0+ ^ 


Proof. Take t > 0 and h > 0. Direct use of Lemma 15.21 with ti = t and t 2 = t + h yields 


f{x{t + h)) - f{x{t)) 
h 


< - fixit))), 


and the conclusion follows by taking the limsup as h —)• 0^ on both sides of this inequality, 
and by using (|4^ . □ 


5.2 Rate of convergence 

In this subsection, we assume that / has minimizers. Let 

z G argmin/, do = inf{||xo — 2^|| : -2 minimizes /} = Hxq — z\\. 

Since v{t) G df{y{t)), for any t > 0 

fivit)) - fiz) < (yit) - z,v{t)) < \\y{t) - z||||u(t)|| 

< \\x{t) - z|| \\v{t)\\ < (Zoll^^(Z)ll 

where we have used y{t) = J^^^{x{t)), z = Jx{t)i^)i nonexpansive, and t i—)• ||x(t) — z\\ 
non-increasing (see (1321) 1. Combining the above inequality with (|44p . we conclude that for 
any t > 0 

fixit)) > fiyit)) + ifiyit) - f{z))^/'^^Je/dl. (51) 

Now we will use the following auxiliary result, a direct consequence of the convexity property 
of r I— r^Z2_ 
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Lemma 5.4. If a,b,c > 0 and a > b + then 

, ca3/2 

b < a — 


1 + (3c/2)aV2 

Proof. The non-trivial case is a, c > 0, which will be analyzed. Define 

if : [0, oo) —)■ M, (fit) = t + 

Observe that (p is convex, and a > (p(b). Let us write the convex differential inequality at a 

0, > pib) > ip{a) + p'(a){b — a). 

After simplification, we obtain the desired result. □ 

Proposition 5.5. For any t >0, 

Kifix) - fiz))^^"^ 


f{y) < fix) - 


1 + (3k/2)(/(x) - /(2;))V2^ 


(52) 


where x = a;(t), y = ?/(f) and k = sJO/d^. 

Proof. Subtracting fiz) on both sides of (1511) we conclude that 

fixit)) - fiz) > fivit)) - fiz) + ifiyit) - fiz))^/^^fi/d^. 

To end the proof, use Lemma [5.41 with a = /(x(t)) — fiz), b = /(y(t)) — fiz) and c = 

y/Wo- □ 

Theorem 5.6. Let us assume that /(a:(0)) < +oo. Set k = ^JQjd^. Then, for any t > 0 

fixo) - fiz) 


fixit)) - fiz) < - 


1 + 


tuy/fixo) - fiz) 


1 2 


2 + 3«;y//(xo) - fiz) 


Proof. Set /3(t) := fixit)) — fiz). Consider first the case where /3(-) is locally Lipschitz 
continuous. Combining Proposition 15.51 with Corollary 15.31 and taking into account that 
/(x(-)) is non-increasing, we conclude that, almost everywhere 

Jr - "i + (3k/ 2)/3V2 - "i + (3^/2)/3y2 

where /3o = /3(0) = /(xq) - fiz). Defining 

= k = 


u 


1 + (3k/2)/3o 


1/2 


and substituting /3 = 1/u^ in the above inequality, we conclude that 


-2u ^—u < - 
dt 


d 


KU 


-3 


1 + (3k/2)/3, 


1/2 
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Therefore, for any t > 0, 


u{t) > 


tn 


2 + 3k/3o 


1/2 


+ 1/A 


1/2 


To end the proof, substitute u = in the above inequality. In the general case, without 

assuming /3 locally Lipschitz, we can write the differential equation in terms of differential 
measures (/I is non-increasing, hence it has a bounded variation, and its distributional deriva¬ 
tive is a Radon measure): 


(i/3 -|- 


K 

1 + 


13 ^^^ < 0 . 


Let us regularize this equation by convolution, with the help of a smooth kernel (note that 
we use convolution in M, whatever the dimension of IK, possibly inhnite). By convexity of 
r I— >■ and Jensen inequality, we obtain that j3 * p^\s a, smooth function that still satishes 

the differential inequality. Thus we are reduced to the preceding situation, with bounds which 
are independent of e, whence the result by passing to the limit as e —>■ 0. □ 


Let us complete the convergence analysis by the following integral estimate. 

Proposition 5.7. Suppose S = argmin/ ^ 0. Then 

+00 1 

^(A(/(y(A) - inf/)dt < -dist^{xo,S). 

Proof. Let us return to the proof of Theorem l3.2l with A = df. Setting hz{t) := 5 ||x(t) — 
with z G argmin/, by ([30]) we have 

hz{t) + ivit) - z,X{t)v{t)) <0. (53) 


By the convex subdifferential inequality, and v{t) G df{y{t)), we have 


fiz) > f{y{t)) + {z- y{t),v{t)). 

Combining the two above inequalities, we obtain 

hzit) + X{t){f{y{t))-M f) <0. (54) 

By integrating this inequality, we obtain the announced result. □ 


6 A large-step proximal point method for convex optimization 
with relative error tolerance 

In this section, we study the iteration complexity of a variant of the proximal point (PP) 
method for convex optimization (CO). It can be viewed as a discrete version of the continuous 
dynamical system studied in the previous sections. The main distinctive features of this 
variant are: a relative error tolerance for the solution of the proximal subproblems similar to 
the ones proposed in nano]; a large-step condition, as proposed in [I21[I3]. 
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The PP method [111 [THl [T7] is a classical method for finding zeroes of maximal monotone 
operators and, in particular, for solving CO problems. It has been used as a framework for 
the analysis and design of many practical algorithms (e.g., the augmented Lagrangian, the 
proximal-gradient, or the alternating proximal minimization algorithms). The fact that its 
classical convergence analysis [18] requires the errors to be summable, motivates the intro¬ 
duction in [191120j of the Hybrid Proximal Extragradient (HPE) method, an inexact PP type 
method which allows relative error tolerance in the solution of the proximal subproblems. The 
relative error tolerance of the HPE was also used for minimization of semi-algebraic, or tame 
functions in |3]. 

Consider the convex optimization problem: 

minimize/(x) s.t. x € TC, (55) 

where / : IK —^ MU {-|-oo} is a (convex) proper and closed function. An exact proximal point 
iteration at x € “K with stepsize A > 0 consists in computing 

y = {I + \df)~^{x). 

Equivalently, for a given pair (A, x) £ x IK, we have to compute y £ IK such that 

0 £ \df{y) + y - X. 

Decoupling the latter inclusion, we are led to the following proximal inclusion-equation system: 

V e df{y), Xv-\-y — x = 0. (56) 

We next show how errors in both the inclusion and the equation in (I56p can be handled 
with an appropriate error criterion {def stands for the classical notion of Legendre-Fenchel 
e-subdifferential). 

Proposition 6.1. Let x £ IK, A > 0 and a £ [0,1[. If y,v £ TC and e > 0 satisfy the 
conditions 

vGdsfiy), \\Xv+ y - xf+ 2Xe < a^\\y - xf, (57) 


then, the following statements hold: 

(a) fix') > f{y)-\-{v,x'- y) - £ Vx'£ 

(b) fix) > fiy) + ^\\vf + ^ \\y - xf > fiy); 

(c) (1 -h a)\\y - x|| > ||Ax|| > (1 - a)\\y - x||; 


(d) £ < 


a 


2(1-u) 


H\ \\y - 3 :||; 


and 


X .. ,,2 I — a"' 


\\y - xf > max i v^A||y - x||(l 


1 — fj, 


- a 


\y - x\ 


(58) 
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Proof, (a) This statement follows trivially from the inclusion in (I57p . and the definition of 
e-subdifferentials. 

(b) First note that the inequality in ([57)1 is equivalent to 

+ \\y -xf - 2A[(u,3: - y) -e\< a‘^\\y - xf. 


Dividing both sides of the latter inequality by 2A, and using some trivial algebraic manipula¬ 
tions, we obtain 


{v,x 


^ „o 

y)-e>T;\\v\\ + 


1-a^ 

2A 


\\y - x\ 


which, in turn, combined with (a) evaluated at x' = x, yields the first inequality in (b). To 
complete the proof of (b), note that the second inequality follows trivially from the assump¬ 
tions that A > 0 and 0 < cr < 1. 

(c) Direct use of the triangle inequality yields 


||y - xll -I- ||Au -I- y - xll > ||Au|| > ||y - a;|| - ||Au -|- y - a;||. 

Since a > 0, A > 0, and e > 0, it follows from (f57|l that ||Au -|- y — x|| < a\\y — x||, which in 
turn combined with the latter displayed equation proves (c). 

(d)hi view of the inequality in (|57l) . the second inequality in (c), and the assumption that 
(T < 1, we have 

2 

2Ae < a‘^\\y - a;|p < —||Au|| ||y - x||, 

1 — a 

which trivially gives the statement in (d). 

To complete the proof of the proposition, it remains to prove (|58p . To this end, first note 
that, due to (c), we have y — x = 0 if and only if u = 0, in which case (l58p holds trivially. 
Assume now that y — x and v are nonzero vectors. Defining the positive scalars 9 = A||y — x||, 
y = A||u||/||y — x|| and using (c) we conclude that 


1 — a < y <1 -\- a. 


(59) 


Moreover, it follows directly from the definitions of 0 and y that 


A 

2 





1-^2 

2A 


\\y-xf 




1-0-2 


Since t -|- 1/t > 2 for every t > 0, it follows that 


y/P { 1 + 


1 -g" 
//2 




> 2^1^, 


where the second inequality follows from the upper bound for y in (I59|) . Combining the last 
two displayed equations, and using the definition of 9, we obtain 


A 

2 





1-^2 


||y - x\ 


> \\vf/^y/X\\y-x\\{l 


a). 


2A 
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Likewise, using the second inequality (c) we obtain 



To end the proof, combine the two above inequalities. 


□ 


Note that (j57l) allows errors in both the inclusion and the equation in (I56|) . Indeed, since 
df{y) C def{y) it is easy to see that every triple (A,y,u) satisfying (l56l) also satisfies (l57|) 
with e = 0. Moreover, if a = 0 in (f57|i then we have that (A, y, v) satisfies (f56]l . 

Motivated by the above results, we will now state our method which uses approximate 
solutions of (I55ji , in the sense of Proposition 16.11 

Algorithm 1: A Large-step PP method for convex optimization 
(0) Let xo S dom(/), a G [0,1[, 0 > 0 be given, and set k = 1] 

(1) choose Afc > 0, and find Xk,Vk G IK, > 0 such that 


Vk G de^fixk), 

IIAfcU/^ -{- Xk ill T ^ \\Xk Xk—\ 

Afclkfc - a^fc-ill > or Ufc = 0; 


(60) 

(61) 

(62) 


(2) if Ufc = 0 then STOP and output Xk\ otherwise let /c •(— /c + 1 and go to step 1. 


end 


We now make some comments about Algorithm 1. First, the error tolerance (I60p - (l6ip is 
a particular case of the relative error tolerance for the HPE/Projection method introduced 
in [191120) . but here we are not performing an extragradient step, while the inequality in (j62|) 
was used/introduced by Monteiro and Svaiter in |12lll3j . Second, as in the recent literature on 
the HPE method, we assume that the vectors and scalars in step (1) are given by a black-box. 
Concrete instances of such a black-box would depend on the particular implementation of the 
method. We refer the reader to the next section, where it is shown that (in the smooth case) 
a single Newton step for the proximal subproblem provides scalars and vectors satisfying all 
the conditions of step (1). 

From now on in this section, {xk}, {u^}, and {Afc} are sequences generated by Al¬ 
gorithm 1. These sequences may be finite or infinite. The provision for Vk = 0 is in (j62j) 
because, in this case, Xk-i is already a minimizer of /, as proved in the sequel. 

Proposition 6.2. For xq G IK, assume that iteration k > 1 of Algorithm 1 is reached (so 
that Xk, Xk, Vk and Sk are generated). Then, the following statements hold: 


(a) fix') > fixk) + {vk,x' - Xk) - Sk Vx' G Ft; 



(c) (1 -h (7)\\xk - Xk-i\\ > IIAfcUfcll > (1 - (7)\\xk - Xfc-ill; 
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(d) Ek < 


2(1-a) 


and 


^k — l II 


^\\vk\? + ^ 2\f, “ ^k-i\? > max|||?;fcf/^V6'(l - a), • (63) 

(e) Suppose inf / > —oo. Then X] < +oo; as a consequence, if the sequences {A^}, {x^} 
etc. are infinite, then —)• +oo as k ^ oo. 


Proof. Items (a), (b), (c), and (d) follow directly from Proposition 16.11 and Algorithm I’s 
definition. To prove (e), first notice that (b) implies, for any j >1 


fiXj-l) > f{Xj) + ^ \\Xj - Xj-l\\^. 

Summing this inequality from j = 1 to k, we obtain 

-I 2 ^ II ll2 

1 — a \\Xj — Xj-i\\ 


f{xo) > f{Xk) + 


■E 


A, 


Note that, in order Algorithm 1 to be defined, we need to take xq G dom/, i.e., /(xq) < +oo. 
Since, by assumption, inf / > —oo, and u < 1, we deduce that 

||xfc — Xfc_i|p , , 

L —— < +“■ (“) 

On the other hand, by definition of Algorithm 1, (15^ . we have Xk\\xk — a^fc-ill > d. Equiva¬ 
lently, \\xk — Xfc_i|p > Combining this inequality with (fM)l . and 0 > 0, we obtain 

'A/c 


< -Foo. 


(65) 

□ 


Suppose now that Algorithm 1 generates infinite sequences. Any convergence result valid 
under this assumption is valid in the general case, with the provision “or a solution is reached 
in a finite number of iterations”. We are ready to analyze the (global) rate of convergence 
and the iteration complexity of Algorithm 1. To this end, let T)q be the diameter of the level 
set [/ < f{xo)], that is. 


To =sup{||3: - y\\ \ max{f{x),f{y)} < /(xq)}. 
Theorem 6.3. Assume that Tq < oo, let x be a solution of (j55p and define 

,2 ' 


D = n 


1 + 


a 


2(1 - a) J 


K = 




( 66 ) 


(67) 


Then, the following statements hold for every k > 1: 
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(a) \\vk\\D > f{xk) - f{x); 

2K{f{xk-i) - /(x))3/2 


(b) f{xk) < fixk-i) - 

(c) f{xk) - f{x) < 


2 + 3K(/(Xfc_i) - /(x)) V2 ’ 
fixo) - f{x) 


1 + k- 


Ky/f{xo) - f{x) 


= 0{l/k'^). 


2 + 3Ky/f{xo) - f{x 
Moreover, for each k >2 even, there exists j ^ {k/2 + 1,... ,k} such that 

-] 2/3 

4 f{xo) - f{x) 


\Vj\\ < 




(T 


k 


2 + k 


n^/fixo) - fix) 


1 2 


2 + “iK^/Jixoi^^Jix) 


= o(i/r 


( 68 ) 


and 


^3 ^ 


4^2 

{I-a) 

k 


fjxo) - f{x) 

2 1/: 

2 + f{xo) - f{x) 


0{l/k^). 


(69) 


Proof, (a) In view of Proposition 16.21 /5) and the fact that x is a solution of (jSSp we have 
max{/(xfc),/(x)} < /(xq) for all /c > 0. As a consequence of the latter inequality and (IM|) 
we hnd 


niax{||x - Xfc-ill, \\xk - Xfc_i||} < Dq V/c > 1. 


(70) 


Using Proposition I6.2l7 a) with x' = x, Proposition I6.2l7 d) and the Cauchy-Schwarz inequality 
we conclude that 


f{xk) - /(x) < {vk, Xk-x) +ek< Ilufcll \\xk - x|| + 


a 


2(1-a) 


||xfc-xfc_i|| Vfc > 1, 


which in turn combined with (I70p and the dehnition of D in (j67|) proves (a). 

(b) By Proposition 16.217 6 ). (I63I1 . the above item (a), and the definition of k in (1671) we have 
for all k > 1: 


/(Xfc-i) - /(x) > /(Xfc) - /(x) + \\vkf^‘^\/9{l - a) 

> fixk) - fix) + n{f{xk) - /(x))^/^. (71) 


Using the latter inequality and Lemma 15.41 (for each A; > 1) with b = /(x^) — fix), a = 
fixk-i) — fix) and c = k we obtain 


fixk) - fix) < fixk-i) - fix) 


njfjxk-i) - /(x))^/^ 

1 + (3K/2)(/(xfe_i) - /(x))V2 


(72) 
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which in turn proves (b). 

(c) Defining ■= f{xk) — f{x), t := 2Kj{2 + SavOo), and using the second inequality in 
Proposition \6.2\f b) and (1721) . we conclude that 


ak < Ofc-i - V/c > 1, 

which leads to (c), by direct application of Lemma fA.II fsee Appendix). 

To prove the last statement of the theorem, assume that A: > 2 is even. Using the first 
inequality in Proposition \Q.2V b). we obtain 

k /c , ^ 2 

fixk/2) - f{xk) = f{xi-i)-f{xi)> ^ yllDlP + \\xi-Xi-if. (73) 

i=k/ 2 +l i=k/ 2 +l * 


Taking j E {A:/2 + which minimizes the general term in the second sum of the latter 

inequality, and using the fact that x is a solution of ([5511 . we have 


k 


f{xk/ 2 ) - fix) > - 

which, in turn, combined with (I63|) and (161|) gives 
fixk/ 2 ) - fix) 


/\ t || ||2 1 ^11 Il 2 

— WVfW H-- \\Xi — Xn-l\\ 

2 " 2X " ^ ^ " 




k/2 


> max 


1 — fj II I 

- \\Xj — Xj-l\ 


> max 


2(1-») 




Combining the latter inequality with (c), and using some trivial algebraic manipulations, we 
obtain (|68|) and (I69|) . which finishes the proof of the theorem. □ 

We now prove that if = 0 in Algorithm 1, then better complexity bounds can be 
obtained. 


Theorem 6.4. Assume that Tq < 00 , and = 0 for all k > 1. Let x be a solution of (1551) 
and define 


Ko 


2)3 


Then, the following statements hold for all k > 1: 


(a) IlufcllTo > fixk) - fix); 


(b) fixk) < fixk-i) 


2Ko(/(xfc-i) - /(x))3/^ 

2 + 3Ko(/(Xfc_i) - /(x))V2 ’ 


(c) fixk) - fix) < 


fjxp) - fix) 

1 I k ~ 

2 + 3koV/(xo) - fix) 


Oil/k^). 
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Moreover, for each k >2 even, there exists j ^ {k/2 + 1,... ,k} such that 

r 1 2/3 


bill < 




(T 


f{xo) - f{x) 


2 + k- 


i^o^/f{xo) - f{x) 


= 0{l/k^ 


(74) 


2 + 3koV/(xo) - f{x)_ 

Proof. By the same reasoning as in the proof of Theorem I6.3l 7a] we obtain (I70p and 
fixk) - f{x) < Ibfelllbfc - x\\ < |bfc||T >0 VA: > 1, 


which in turn proves (a). Using (a), the definition of kq, and the same reasoning as in the 
proof of Theorem 16.31/ 6) we deduce that dTTI) holds with kq in the place of k. The rest of the 
proof is analogous to that of Theorem 16.31 □ 


In the next corollary, we prove that Algorithm 1 is able to find approximate solutions of 
the problem (f55]l in at most 0{l/^/e) iterations. 

Corollary 6.5. Assume that all the assumptions of Theorem 16.41 hold, and let e > 0 be a 
given tolerance. Define 

2 + 3Ko^yf{xo) - f{x) 2 (2 + 3koV/(xo) - 

Then, the following statements hold: 

(a) for any k>K, f{xk) - f{x) < e; 

(b) there exists j <2\J~\ such that ||uj|| < e. 

Proof. The proof oi (a) and (b) follows trivially from Theorem 16.41 /c) and (|74p . respectively, 
and from ([75]). □ 


7 An 0(l/v^) proximal-Newt on method for smooth convex 
optimization 

In this section, we consider a proximal-Newton method for solving the convex optimization 
problem 

minimize/(x) s.t. x £ TC, (76) 

where / : IK —)• M, and the following assumptions are made: 

ASl) / is convex and twice continuously differentiable; 

AS2) the Hessian of / is L-Lipschitz continuous, that is, there exists L > 0 such that 

||v2/(x) - V^f{y)\\ < L\\x - y\\ Vx,y G 5C 
where, at the left hand-side, the operator norm is induced by the Hilbert norm of IK; 
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ASS) there exists a solution of d76t) . 

Remark. It follows from Assumptions ASl and AS2 that V^/(x) exists and is positive 
semidefinite (psd) for all x £ “K, while it follows from Assumption AS2 that 

- x)\\ < ^\\y - xf Vx,ye?{. (77) 

Using assumption ASl, we have that an exact proximal point iteration at a: £ IK, with 
stepsize A > 0, consists in finding y £ IK such that 

AV/(y) + y - x = 0 (cf. ([56])). (78) 


The basic idea of our method is to perform a single Newton iteration for the above equation 
from the current iterate x, i.e., in computing the (unique) solution y of the linear system 

A(V/(x) + V^f{x){y - x)) + y - X = 0, 


and defining the new iterate as such y. We will show that, due to (177)) . it is possible to choose 
A so that: a) condition ()57)) is satisfied with e = 0 and v = V/(y); b) a large-step type 
condition (see (|62|) 1 is satisfied for A, x and y. First we show that Newton step is well defined 
and find bounds for its norm. 


Lemma 7.1. For any x £ IK, if X > 0 then AV^/(x) +1 is nonsingular and 


A||V/(x)|| 
A||V2/(a:)|| + l 


< ||(AV2/(x) + /)-'AV/(x)|| < A||V/(x)||. 


(79) 


Proof. Non-singularity of AV^/(x) +1, as well as the inequalities in (17^ . are due to the facts 
that A > 0, V^/(x) is psd (see the remark after the Assumption ASS), and the dehnition of 
operator’s norm. □ 


The next result provides a priori bounds for the (relative) residual in (I78p after a Newton 
iteration from x for this equation. 


Lemma 7.2. For any x £ IK, if X> 0, a > 0, and 

y = x- {XV^ix) + /)-UV/(x), A||(AVV(x) + /)-'AV/(x)|| < (80) 

then ||AV/(y) -|-y — x|| — x||. 

Proof. It follows from (1801) that 

AV/(y) + y-x = XVf{y) - A[V/(x) V^/(x)(y - x)], A||y - x|| < 

Therefore 

l|AV/(y) + y-x\\= A||V/(y) - V/(x) - V‘^f{x){y - x)|| < ^\\y - xf < a\\y - x||, 
where the hrst inequality follows from (|77p . □ 
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Lemma 7.3. For any x ^ “K, and 0 < cr^ < cr^ < +oo, if Vf{x) ^ 0 then the set of all 
scalars A g]0,+oo[ satisfying 

^ < A||(AV2/(x) + /)-'AV/(a:)|| < ^ (81) 

is a (nonempty) closed interval [A^jA^] C]0,+oo[, 


_ \\^^f{x)\\au // ||VV(x)||a. y \\Vf{x)\\2au 

2ai/L ^ L L J L 

l|V/(x)||- ||V/(a;)|| 


(82) 


and \u/\i > \/o-„/cr£. 

Proof. Assume that V/(a:) is nonzero. Define the operator A : tK —>■ IK hy A{y) = \7f{x) + 
V^/(x)(y — x). Since V^/(x) is psd, it follows that the affine linear operator A is maximal 
monotone. It can be easily checked that, in this setting, 

J^ix) = x- (AV2/(x) + /)-UV/(x), (^(A,x) = A||(AV2 /(x) + I)-^XVf{x)\\, (83) 


(see ([6|) and the paragraph below Q to recall the notation). Hence, using Proposition 11.41 we 
conclude that there exists 0 < A^ < A„ < oo such that 

(p{\i,x) = ‘^, (p{Xu,x) = ‘^, (84) 

and the set of all scalars satisfying (fST)) is the closed interval [A^, A^] C]0, +oo[. It follows from 
the second inequality in (jl3p and the above (implicit) definitions of Xi and A^ that 


~17 


p{Xu,x) < 



2 

ip{Xi,x) 


/ XuX'^ 2ai 
\XJ L 


which trivially implies that Xu/Xe > To prove the two inequalities in (|8^ . first 

observe that, in view of the expression (1831) for ip{X,x), and Lemma ITTI we have 


A^I|V/(^)II 

A||V2/(x)|| + l 


<if{X,x)<X^\\Vf{x)\\. 


Then, evaluate these inequalities for X = Xi, X = Xu, and use the above implicit expression 
(IHljl for Xi and A„. □ 

Motivated by the above results, we propose the following algorithm for solving (|76p . This 
algorithm is the main object of study in this section. We will prove that, for a given tolerance 
e > 0, it is able to find approximate solutions of (1761) in at most 0{l/y/£) iterations, i.e., 
it has the same complexity as the cubic regularization of the Newton method proposed and 
studied in HU. 
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Algorithm 2: A proximal-Newton method for convex optimization 


(0) Let xo S 0 < iT£ < (Tu < 1 be given, and set k = 1; 

(1) if V/(xfe_i) = 0 then stop. Otherwise, compute Afc > 0 such that 


< AfclKl + AfcV2/N-i))-^AfcV/(xfc_i)|| < 


(85) 


(2) set Xk = Xk-i - {I + AfcV^/(xfc_i)) UfcV/(xfc_i); 

(3) set A: ■(— /c + 1 and go to step 1. 


end 


Remark. We note that, for a given Afc > 0, iterate Xk, defined in step (2) of Algorithm 2, is 
the solution of the quadratic problem 



Hence, our method is based on classical quadratic regularizations of quadratic local models 
for /, combined with a large-step type condition. 

At iteration k, we must find Xk G [A^jA^^], where 




Lemma 17131 provides a lower and an upper bound for A^ and A„ respectively, and guarantees 
that the length of the interval [log A^, log A^] is no smaller than log(cj„/cr£)/2. A binary 
search in log A may be used for finding Xk- The complexity of such a procedure was analysed 
in [laiis], in the context of the HPE method. The possible improvement of this procedure 
is a subject of future research. 

Proposition 7.4. For xq £ "K and 0 < < au < I, consider the sequences {A^} and {x^} 

generated by Algorithm 2 and define 


a = au, 0 = 2fT£/L, Vk = Vf{xk), efc = 0 VA: > 1. 


( 86 ) 


Then, the following statements hold for every k>l: 

(a) Vk £ %/(xfc), \\XkVk + Xk- Xk-i\\ < cr\\xk - Xk-i\\; 

(b) Xk\\xk - Xfc-ill > 0; 

(c) Xk > ^/Te/lfi^-^\l)^Xk-l; 

(d) Vk is nonzero whenever vq is nonzero. 

As a consequence, Algorithm 2 is a special instance of Algorithm 1, with a, 0 and the sequences 
{vk} and {sk} given by ([ 861 ) . 
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Proof, (a) First note that the inclusion in (a) follows trivially from the definition of and 
in ([86]) . Moreover, using the definitions of a and in ([86l) . the second inequality in ([85]) . the 
definition of in step 2 of Algorithm 2, and Lemma 17.21 with X = Xk, y = and x = x^-i 
we obtain 


\\>^kVk + Xk - Xk-i\\ = ||AfcV/(xfc) + Xk- Xfc-ill < a\\xk - Xk-i\\, 
which concludes the proof of (a). 

(b) The statement in (b) follows easily from the definition of Xk and 6 in step 2 of Algorithm 
2 and (l86]) . respectively, and the first inequality in ([85]). 

(c) Using Algorithm 2’s definition, item (a), and Lemma ITT] with A = Xk, x = Xk-i we 
have, for all fc > 1 

Afc||V/(xfc)|| < (1 + a„)||(AfcV2/(xfc-i) + I)-^XkVf{xk-i)\\ < (1 + aJAfc||V/(xfc_i)||. (87) 

Set Sk = —(AfcV^/(xfc_i) + /)“^AfcV/(xfc_i). Note now that (1851) and the definition of Sk 
imply that 2cj£/L < ||AjSj|| < 2cj„/L for all j = 1, • • • , k. Direct use of the latter inequalities 
for j = k — 1 and j = k, and the multiplication of the second inequality in the latter displayed 
equation by A^_^Afc yield 

Xl_,i2ae)fL < At^A^||V/(x,_i)|| = XlXk-i\\Xk-iVfixk-i)\\ 

^ (1 + c’'M)A|||Afc_iSfc_i|| 

< (1 + (Tu)A|(2(Tu)/L, 


and, hence, the inequality in (c). 

(d) To prove this statement observe that if Vf{xk-i) / 0 then Xk / and use item 

(a), the second inequality in item (c) of Proposition 16.11 and induction in k. □ 

Now we make an additional assumption in order to derive complexity estimates for the 
sequence generated by Algorithm 2. 

AS4) The level set {x £ !K\ f{x) < /(xq)} is bounded, and To is its diameter, that is. 

To = sup{||y - x|| I max{/(x), f{y)} < /(xq)} < oo. 

Theorem 7.5. Assume that assumptions ASl, AS2, AS3, ASf hold, and consider the se¬ 
quence {xfc} generated by Algorithm 2. Let x be a solution of ([75|1 and, for any given tolerance 
s > 0 define 

/ _N 2/3 

_ j 2ae{l - Q-J' _ 2 + 3/to^/(xo) - /(x) 2L^ ^ {2 + Supy/fixp) - /(x)j 

Then, the following statements hold for every k > 1: 

(a) for any k>K, f{xk) - /(x) < e; 

(b) there exists j <2\J~\ such that ||V/(xj)|| < e. 
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Proof. The proof follows from the last statement of Proposition 17.41 and Corollary 16.51 □ 

In practical implementations of Algorithm 2, as in other Newton methods, the main iter¬ 
ation is divided into two steps: the computation of a Newton step Sk, 

Sk = -{Xk^y{xk-i)+I)-^Xk^f{xk-i), 

and the update Xk = Xk-i + Sk- As in other Newton methods, step Sk is not to be computed 
using the inverse of AfcV^/(xfc_i) -|- I. Instead, the linear system 

(V^/(xfc) fikl)sk = -V/(xfc_i), /ifc = 1/Afc 

is solved via a Hessenberg factorization (followed by a Choleski factorization), a Cholesky 
factorization, or a conjugate gradient method. Some reasons for choosing a Hessenberg fac¬ 
torization are discussed in [12]. For large and dense linear systems, conjugate gradient is the 
method of choice, and it is used as an iterative procedure. In these cases, the linear system is 
not solved (exactly). Even for Hessenberg and Cholesky factorization, ill-conditioned linear 
systems are inexactly solved with a non-negligible error. 

Since Afc —)■ oo, /ifc —>• 0 and, in spite of the regularizing term /ifc/, ill-conditioned systems 
may occur. For these reasons, it may be interesting to consider a variant of Algorithm 2 where 
an “inexact” Newton step is used, see m for the development of this method in the context 
of the HPE method. 

7.1 Quadratic convergence in the regular case 

In this section, we will analyze Algorithm 2 under the assumption: 

AS3r) there exists a unique x* solution of ([76]l . and V^/(x*) is non-singular. 

Theorem 7.6. Let us make assumptions ASl, AS2, and AS3r. Then, the sequence {xk} 
generated by Algorithm 2 converges quadratically to x*, the unique solution of (I76p . 

Proof. Let M := ||V^/(x*)“^||. For any M' > M there exists ro > 0 such that 

X e H(x*,ro) V^/(x) is non-singular, ||V^/(x)“^|| < M'. 

Since {f{xk)} converges to f{x*), it follows from assumptions ASI and AS3r that Xk x* 
as A: —)• oo; therefore, there exists kQ such that 

||x* — Xk\\ < ro for k > ko. 

Define, for k > ko, Sk, s^, and as 

Sk = -(/ + XkV^f{xk-i))-^XkVf{xk-i), s^ = -V^f{xk-i)-^Vf{xk-i), sl = x*- Xk-i- 

Observe that Sk is the step of Algorithm 2 at Xk-i, and s^ is Newton’s step for (|76]l at Xk-i. 
Define also 

Wk = V^f{xk-i){sl) + Vf{xk-i) = V^f{xk-i){x* - Xk-i) + V/(xfc_i). 
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( 88 ) 


Since V/(x*) = 0, it follows from assumption AS2 that Urcfcll < -^ll'StlP/2- Hence 


4 -Sfcll = ||V^/(xfc_i) ^Wk\\ < 


M’L. 


„* ||2 
SkW ■ 


Let us now observe that 


\Sk\\ < Pfcl 


This is a direct consequence of the definition of Sk, , and the monotonicity property of 
V^/(xfc_i). By the two above relations, and the triangle inequality we deduce that 


I ii^ii =»=ii I II N =t=ii^ii * 

I'Sfcll < lISfcll + ||Sfc - •Sfcll < ||Sfc 


1 H- 


The hrst inequality in ()85p is, in the above notation, 2ailL < Afc||sfc||. Therefore, 


—Il'Sfcll- 

2cj£" 


It follows from the above dehnitions that 


V^f{xk-i)sk + + V/(xfc-i) = 0, V'^f{xk-i)sk + V/(xfc_i) = 0. 

Hence V^/(xfe_i)(s^ - Sk) = A^4fc, which gives, by ([89]) 


W^k ~ '®fc|l — ^ll^fell — 


M'L 

2ai 




Combining (IHHI) with (IM]) . we finally obtain 

l|a;* - ^^fcll = ||sfe - Sfcll < 114 - Sfe II + \\sk - Sfcl 


< 


M'L 

M'L 


I * Il2 I ^11 I 

I'^fell “I Pfcl 

CTi 


1 + 


1 

(^e 


1 + ^^1141 


\X -Xk-l\ 


(89) 


(90) 


□ 


8 Concluding remarks 

The proximal point method is a basic block of several algorithms and splitting methods in 
optimization, such as proximal-gradient methods, Gauss-Seidel alternating proximal mini¬ 
mization, augmented Lagrangian methods. Among others, it has been successfully applied to 
sparse optimization in signal/image, machine learning, inverse problems in physics, domain 
decomposition for PDE’S... In these situations, we are faced with problems of high dimen¬ 
sion, and this is a crucial issue to develop fast methods. In this paper, we have laid the 
theoretical foundations for a new fast proximal method. It is based on a large step condition. 
For convex minimization problems, its complexity is O(^), and global quadratic convergence 
holds in the regular case for the associated proximal-Newton method. It can be considered 
as a discrete version of a regularized Newton continuous dynamical system. Many interesting 
theoretical points still remain to be investigated, such as obtaining fast convergence results for 
maximal monotone operators which are not subdifferentials, the combination of the method 
with classical proximal based algorithms, and duality methods, as mentioned above. The 
implementation of the method on concrete examples is a subject for further research. 
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A Appendix 

A.l A discrete differential inequality 

Lemma A.l. Let {ak} be a sequence of non-negative real numbers and let t >0 be such that 

3/2 

t^/oq < 1 . If Ok < flfc-i — for all k > 1, then 

oq 


Ok < 


[l + kTy/a^/2Y 


Proof. Since { 0 ^} is non-increasing, it follows that = 0 implies Ok+i = afc +2 = • • • = 0 and, 
consequently, the desired inequality holds for all k' > k. Assume now that > 0 for some 
A: > 1. Using the assumptions on {a^} we find the following inequality: 


> 


3/2 

Oj-i - TaJ_^ 


>0 Vj < k. 


Taking the square root on both sides of latter inequality and using the convexity of the scalar 
function 11 —>• 1 / y/i we conclude that 


> 


3/2 

Uj-i - Taj_^ 


^1,1 3/2 

>- 1 - ^rf7rTaf_^ = 


, - . ’PIP ‘ “7-1 ~ , +77 


Adding the above inequality for j = 1,2,... ,k we obtain 

1 1 , , 

> + kT/2, 


y/ Ok y/ Oq 

which in turn gives the desired result. 


□ 


A.2 Some examples 

Consider some simple examples where we can explicitly compute the solution (x. A) of the 
algebraic-differential system Q, and verify that this is effectively a well-posed system. 


Isotropic linear monotone operator Let us start with the following simple situation. 
Given a > 0 a positive constant, take A = al, i.e., for every x € TC Ax = ax. One obtains 


(AA -I- /) = 


1 


1 Ao 


X — (AA -I- /) ^x = 


Aa 


-X. 


1 \a 

Given xq 7 ^ 0, the algebraic-differential system (jj]) can be written as follows 

x{t) -\- ■:r——^TT-:x{t) = 0, X{t) > 0, 


1 “t" a\{f) 


aX{t) 

1 oA(t) 
x( 0 ) = Xq. 


= 0 , 


(91) 

(92) 

(93) 

(94) 

(95) 
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Let us integrate the linear differential equation (1931) . Set 


We have 



aA(r) 

1 + Q;A(r) 


dr. 


x{t) = e 

Equation (fMD becomes 

l + aA(t) lixoir 

First, check this equation at time t = 0. Equivalently 

aA(0)^ _ 9 

l + aA(0) ||xoir 


(96) 

(97) 

(98) 


(99) 


This equation defines uniquely A(0) > 0, because the function ^ i—)• is strictly increasing 

from [0, +oo[ onto [0, +oo[. Thus, the only thing we have to prove is the existence of a positive 
function 1 1—)• A(t) such that 


h{t) := 


1 -|- Q:A(i) 


is constant on [0,+oc[. 


( 100 ) 


Writing that the derivative h' is identically zero on [0,+oo[, we obtain that A(-) must satisfy 


+ 2) — a\{t)^ = 0. 


( 101 ) 


After integration of this first-order differential equation, with Cauchy data A(0), we obtain 

2 2 

a In A(t) — = at -|- a In A(0) — (102) 

A{t) A(0) 

Let us introduce the function g : ]0, +oo[^ M 

g(^)=aln^-^. (103) 

One can easily verify that, as t increases from 0 to -|-oo, g{t) is strictly increasing from —oo 
to +00 . Thus, for each t > 0, ()102p has a unique solution X{t) > 0. Moreover, the mapping 
t —)• A(t) is increasing, continuously differentiable, and limf_j.oo A(t) = +oo. Returning to 
(11021) . we obtain that X{t) Ri e* as t —>■ -|-oo. 


Antisymmetric linear monotone operator Take IK = and A equal to the rotation 
centered at the origin and angle The operator A satisfies A* = —A (anti self-adjoint). 
This is a model example of a linear maximal monotone operator which is not self-adjoint. Set 
X = (C) h) £ We have 

= (-??, 0 - 
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(104) 


(XA + I) ^x= (g + Ar?, r? - Ag) 
x-{XA + ly^x = ^ (a^ -ri,Xri + ^y (105) 

The condition A||(A^ + I)~^x — x\\ = 9 can be reexpressed as 

Y^^I|(ac-^,A7? + c)|| = 0. 


Equivalently 


A 2 

VTT^ 


+rf 


9. 


Given xq ^ 0, the algebraic-differential system (j4]) can be written as follows 


i{t) + ^ ^ 9{t)) - 0, 

A(t) > 0, 

(106) 

^{t) + ^ ^ + ^{t)) - 0, 

A(t) > 0, 

(107) 

/*** Ve(t)"+')(tp = ». 

^l + X{tf 


(108) 

II 

H 

o 


(109) 


Set u{t) = C(0^ + After multiplying (|106p by ^(t), and multiplying (I107p by r]{t), then 

adding the results, we obtain 


u'{t) + 


2X{tf 
l + X{tf 


u{t) = 0. 


Set 



m ■■= f 



Jo 

1-f A(r)^ 

We have 




u{t) = e 


Equation (|108P becomes 




Xitf ^ 

A(t) 9 


Jl + Xitf 

ll^oll 


First, check this equation at time t = 0. Equivalently 


A(0)^ _ 9 

ikoir 


( 110 ) 

( 111 ) 

( 112 ) 


(113) 


This equation dehnes uniquely A(0) > 0, because the function p i—> 




is strictly increasing 


from [0, -|-oo[ onto [0, -|-oo[. Thus, the only thing we have to prove is the existence of a positive 


36 




























function 1 1 —)• X{t) such that 


h{t) : = 



is constant on [ 0 ,+oo[. 


(114) 


Writing that the derivative h' is identically zero on [0,+oo[, we obtain that A(-) must satisfy 

\'{t){2\{t) + \{tf) - \{tf = 0. (115) 

After integration of this first-order differential equation, with Cauchy data A(0), we obtain 


m- 


2 

W) 


t -|- A(0) 


2 

aM' 


(116) 


Let us introduce the function g : ]0, +oo[—)■ M 

g{p)=p--. (117) 

P 

As t increases from 0 to -|-oo, g{t) is strictly increasing from —oo to +oo . Thus, for each 
t > 0 , (jll 6 p has a unique solution A(f) > 0. Moreover, the mapping t —^ X{t) is increasing, 
continuously differentiable, and limt^oo-^(0 = +oo. Returning to (I116|) . we obtain that 
X{t) Ri t as f — >■ -|-oo. 
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